ENH/BUG groupby nth now filters, works with DataFrames #6569

hayd · 2014-03-07T17:50:43Z

partial for #5264

In [101]: df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B'])

In [102]: g = df.groupby('A')

In [103]: g.nth(0)
Out[103]:
   A   B
0  1 NaN
2  5   6

In [104]: g.nth(1)
Out[104]:
   A  B
1  1  4

In [105]: g.nth(-1)
Out[105]:
   A  B
1  1  4
2  5  6

In [106]: g.nth(0, dropna='any')  # old behaviour-like
Out[106]:
   B
A
1  4
5  6

In [107]: g.nth(1, dropna='any')  # old behaviour-like
Out[107]:
    B
A
1 NaN
5 NaN

hayd · 2014-03-07T17:53:25Z

Also note old behaviour was not stable/correct for negative (now fixed with PR&dropna):

In [9]: g.nth(-3,)
Out[9]:
               B
A
1  2.144760e-314
5  2.124748e-314

In [10]: g.B.nth(-3,)
Out[10]:
A
1    2.144760e-314
5    2.144337e-314
Name: B, dtype: float64

jreback · 2014-03-07T17:57:04Z

If you get around to it; I suspect the new method is MUCH faster than the old, so maybe add a vbench

hayd · 2014-03-07T18:10:03Z

will append a vbench. Is much faster except when applying to dataframe with dropna (old-style) it's a little slower, but that was previously borked.

Um, obviously there is overlap with first and last methods, they be got with nth(0) and nth(-1) but not tested the differences yet... you reckon these should change too?

jreback · 2014-03-07T18:18:02Z

yes I think you should blow away first/last code and just alias them to nth(0) and nth(-1).

reminds me that pls put some tests that deal with different types (because first/last have this convert arg..though not sure why)

jreback · 2014-03-07T18:19:44Z

though maybe nth(0) (first) and (iloc[0]) deserver a fast-path as it doesn't need the machinery of cumcount

hayd · 2014-03-07T18:39:11Z

there was/is a weird test for types of first/last/nth, I tweaked it a little but is still there...

I can iterate tests over a few of the same df (but with different column types), is that what you mean?

Yea, re fast path (will see how they compare the the cumcount for now)...

jreback · 2014-03-07T18:40:51Z

pandas/tests/test_groupby.py

@@ -165,10 +164,10 @@ def test_first_last_nth(self):
        grouped['B'].last()
        grouped['B'].nth(0)

-        self.df['B'][self.df['A'] == 'foo'] = np.nan


hmm.....this should have actually raised a SettingWithCopy (as the test suite sets it to raise)...wierd

hayd · 2014-03-07T19:48:53Z

Added vbench, is about 40 times faster with not-including the setup of the groupby (which is included in the bench)

jreback · 2014-03-07T19:50:17Z

awesome!

ENH/BUG groupby nth now filters, works with DataFrames

ENH/BUG groupby nth now filters, works with DataFrames

c444c73

jreback added Bug labels Mar 7, 2014

jreback added this to the 0.14.0 milestone Mar 7, 2014

TomAugspurger mentioned this pull request Mar 7, 2014

BUG: groupby sub-selection ignored with some methods #5264

Closed

31 tasks

jreback reviewed Mar 7, 2014
View reviewed changes

TST add vbench for groupby nth

feaca40

hayd added a commit that referenced this pull request Mar 7, 2014

Merge pull request #6569 from hayd/groupby_nth

6e758b7

ENH/BUG groupby nth now filters, works with DataFrames

hayd merged commit 6e758b7 into pandas-dev:master Mar 7, 2014

hayd deleted the groupby_nth branch March 7, 2014 20:24

jorisvandenbossche mentioned this pull request May 5, 2014

API: update nth to use the _set_selection_from_grouper makes first==nth(0) and last==nth(-1) #7044

Merged

jorisvandenbossche mentioned this pull request Sep 13, 2015

PERF: improves performance in GroupBy.cumcount #11039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/BUG groupby nth now filters, works with DataFrames #6569

ENH/BUG groupby nth now filters, works with DataFrames #6569

hayd commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014

jreback commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback Mar 7, 2014

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014

ENH/BUG groupby nth now filters, works with DataFrames #6569

ENH/BUG groupby nth now filters, works with DataFrames #6569

Conversation

hayd commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014

jreback commented Mar 7, 2014

hayd commented Mar 7, 2014

jreback Mar 7, 2014

Choose a reason for hiding this comment

hayd commented Mar 7, 2014

jreback commented Mar 7, 2014